Introduction

In this report, I will be focusing on comparing publisher information, checkouts per month, and distribution type from 2018 to 2023.

I will be using the Seattle Public Library’s checkouts data to answer the following questions:

Summary Information

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.1     ✔ purrr   1.0.1
## ✔ tibble  3.1.8     ✔ dplyr   1.1.0
## ✔ tidyr   1.3.0     ✔ stringr 1.5.0
## ✔ readr   2.1.3     ✔ forcats 1.0.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## `summarise()` has grouped output by 'Publisher'. You can override using the `.groups` argument.

In my analysis of the Seattle Public Library’s checkouts data, I found that the number of checkouts for CDs and DVDs have been steadily decreasing since 2018. The number of checkouts for books has been heavily fluctuating since 2018. The number of checkouts for audiobooks and eBooks have been steadily increasing since 2018, with Audiobooks being the most checked out material type of 2023 so far.

Results

The most checked out item from 2017-2023 was Educated: A Memoir with 17817 checkouts.

The top 5 publishers by checkouts are:

The top item for each of the top 5 publishers is:

The Dataset

The dataset is collected and published by the Seattle Public Library. The dataset includes items that were checked out from the Seattle Public Library more than 5 times from 2018 to 2023. The dataset includes the following metadata: Material Type, Checkout Month/Year, Subject, Publisher, Publishing Year, Book Title, and ISBN. The data was collected by the Seattle Public Library’s circulation department. It was collected to help the Seattle Public Library understand the types of materials that are being checked out by the public, as well as keep a record of what books are currently checked out. It includes 12 features and 816,354 observations.

We might need to consider the ethical questions of using a dataset that excludes certain types of materials. For example, the dataset excludes items checked out less than 5 times. This could be problematic because it excludes items checked out less than 5 times, which could be a large portion of the library’s collection. Additionally, we may need to consider what types of content are banned from libraries and why.

The data in this dataset is limited to the Seattle Public Library’s collection. This means that the data is representative of only some of the Seattle population. For example, the dataset does not include items that were checked out less than 5 times. This could be problematic because it excludes items checked out less than 5 times, which could be a large portion of the library’s collection. Additionally, the data in this dataset is not uniform and needs trimming to represent publisher information accurately. The subject category was hard to narrow down due to publishers having multiple branches and sloppy encoding.

Your Choice

Finally, I made a faceted scatterplot that graphed the checkout year by the checkout amount for each publisher. The color of the points represents the material type of the title. I chose a scatterplot because I thought that it would be interesting to see the relationship between the checkout year, the amount of checkouts and the material type. I felt that a scatterplot would create a very interesting visual between the three variables, and it would highlight anomalies and important outliers in the data.

Books on Tape is entirely made up of audiobooks, so it is not surprising that all of their checkouts are audiobooks. It is interesting that no publisher spiked in the same year. Most notably seen from the data, eBooks and Audiobooks titles have been spiking since COVID, and the number of checkouts for certain titles in these formats are thousands of checkouts above any other format during this time period.